═══ 1. Sslurp! ═══ Sslurp! 1.1 Sslurp! can retrieve Web pages from a HTTP (WWW) server. It can be configured to follow all hyperlinks on the page that lead to other pages on the same server. Images on the pages can be retrieved as well. All pages are stored on disk and can be viewed later using your web browser. Sslurp! can make use of a proxy HTTP server, speeding up the whole procedure. Sslurp! requires at least one HPFS partition! Topics: The main window Common tasks Command line options For the techies Contacting the author ═══ 1.1. The main window ═══ On the main window you find the following elements: A drop down list where you enter the URL. The last 15 URLs are saved. You can quickly enter an URL here by dragging an URL object from a WPS folder to this entry field. "Start", "Stop" and "Skip" buttons. A log window. Its contents are also stored in the log file. A status line. Its contents are: - the current URL - total number of data bytes retrieved - total number of data bytes of the current URL - number of bytes retrieved of the current URL - number of URLs retrieved - number of URLs tried - number of URLs queued for inspection (estimated). ═══ 1.2. Common tasks ═══ Here's how to perform some common task with Sslurp!: I wan't to suck a complete web site. In the setup, enable "Follow links", "Inline images". Disable "Don't climb up". Then enter the root URL of the site (e.g. "http://www.thesite.com/"), then press "Start". I wan't to suck a subrange of a web site. In the setup, enable "Follow links", "Inline images" and "Don't climb up". Then enter the URL of the site (e.g. "http://www.thesite.com/some/path/start.html"), then press "Start". I wan't to suck a single web page with images, but only if it's changed. In the setup, disable "Follow links". Enable "Inline images" and "Modified pages only". Then enter the URL of the page (e.g. "http://www.thesite.com/pageofinterest.html"), then press "Start". ═══ 1.3. Command line options ═══ Sslurp! can be run in automated mode, i.e. it takes one or more URLs as program parameters, downloads these pages according to the program options, and exits when finished. The command line syntax is: SSLURP.EXE [Options] [ | @]* In other words, you can specify options, one or more URLs, and one or more list files. Each line in a list file is interpreted as URL. Empty lines and lines starting with ';' are ignored. The following command line options are supported: -T Retrieved items are stored in the given directory. -L- No links are followed -Ls Only links to the same server are followed -Ld Only links that are not pointing upward are followed -La All links are followed -E["extensions"] Only links with one of the given file extensions are followed -X["extensions"] Only links excluding the ones of the given file extensions are followed -I+ Inline images are downloaded -I- Inline images are not downloaded -Ia Inline images are downloaded, even those on different servers -A+ Applets are downloaded -A- Applets are not downloaded -Aa Applets are downloaded, even those on different servers -U+ Only items newer than local copies are downloaded -U- All items are downloaded -S Restricts downloaded items to bytes -S- Downloads are not restricted by size -D Restricts followed links to steps -D- Downloads are not restricted by link depth -P+ Uses the proxy server -P- Does not use the proxy server Note: Command line options override options given in the setup. For options not given in the command line, the setup options are used. So if an option is turned on in the setup, you must explicitly switch it off to deactivate it. It's not sufficient to just leave out the command line option! Stored options are not modified by command line options. When finished, Sslurp! returns one of the following ERRORLEVEL values: 0 Everything OK 1 Invalid command line option 2 Problem(s) with one of the list files 10 Other error ═══ 1.4. For the techies ═══ Here's some technical information if you're interested: Sslurp! uses HTTP 1.0. HTTP 0.9 is not supported. If some web site is still using a HTTP 0.9 server, its contents may be just as outdated, so you might not miss anything. HTTP 1.1 server replies are recognized. Sslurp! only follows HTTP links, not FTP or others. Sslurp! counts

and as inline images. If the file name of a retrieved page isn't specified, it's stored as INDEX.HTML. The "Last-Modified" timestamp is stored in the file's EAs. The EA name is HTTP.LMODIFIED and is of type EAT_ASCII. Some characters in the URL are converted when building the path name of the file. However, no conversion to FAT (8.3) names is performed! If a page is redirected, the redirection is automatically followed, but only if the new location is on the same server! Sslurp! has been developed on and tested with OS/2 Warp 4.0. It should also work with the following configurations: - Warp 3.0 with IAK - Warp 3.0 with TCP/IP 2.0 - Warp 3.0 Connect (TCP/IP 3.0) - Warp Server ═══ 1.5. Contacting the author ═══ Sslurp! was developed by Michael Hohner. He can be reached electronically at: EMail: miho@osn.de Fidonet: 2:2490/2520.17 ═══ 2. File menu ═══ Exit Ends the program. ═══ 3. Setup ═══ Options Specify all program options. ═══ 3.1. Servers ═══ Proxy Enter the host name of a proxy HTTP server. You may also specify a port number for the proxy server. Check Enable to finally use the server. Contact your service provider to get this data. Note: Only enter the host name, not the URL (e.g. "proxy.isp.com", not "http://proxy.isp.com:123/")! User name Enter your user ID here if your proxy server requires authentication. Password Password for proxy authentication. Email address Enter your EMail address. It is included in every request. Don't enter anything here if you don't want your EMail address to be revealed. ═══ 3.2. Paths ═══ Path for retrieved data Path where retrieved pages and images are stored. This path and subpaths are created automatically. ═══ 3.3. Logging ═══ These options control logging. Log file Path and name of the log file. Additional information Log additional (but somewhat optional) messages Server replies Log reply lines by the server Debug messages Log messages used for debugging purposes (turn on if requested). ═══ 3.4. Links ═══ none No links are followed all All links (even those to other servers) are followed. Be very careful with this option! same servers Only links to items on the same server are followed. don't climb up Hyperlinks to items that are hierarchically higher than the initial URL are not followed. Otherwise, all links to items on the same server are followed. Example: If you started with http://some.site/dir1/index.html, and the current page is http://some.site/dir1/more/levels/abc.html, a link that points to http://some.site/otherdir/index.html wouldn't be followed, but a link to http://some.site/dir1/x/index.html would. all types All types of links are followed, restricted only by the above settings. including You can enter a set of extensions (separated by spaces, commas or semicolons) of items to retrieve. Links to items with other extensions are ignored. Example: With "htm html", Sslurp! only follows links to other HTML pages, but does not download other hyperlinked files. excluding Reverse of the above option. Only links to items not having one of the given extensions are followed. Max link depth Limits the depth of links to follow to the specified number. A level of "1" specifies the initial page. Example: If page A contains a link to B, and B contains a link to C, A would be level 1, B would be level 2 and C would be level 3. A maximum link depth of "2" would retrieve pages A and B, but not C. Max size Limits the size of items to download. If the server announces the size and it's larger than the number specified, the item is skipped. If the server doesn't announce the size, the item is truncated when the maximum size is reached. ═══ 3.5. Options ═══ These settings influence which items will be downloaded and how it'll be done. Inline images If checked, inline images are also retrieved. from other servers If checked, inline images located on other servers are also retrieved. Otherwise only images from the same server are downloaded. Java applets If checked, java applets are also retrieved. from other servers If checked, applets located on other servers are also retrieved. Otherwise only applets from the same server are downloaded. Retrieve modified items only An item is only retrieved if it's newer than the local copy. Strongly recommended! ═══ 3.6. Server list ═══ A list of base URLs is displayed. Press New to add a new URL with settings. Press Change to change the settings of the selected URL. Press Delete to delete the selected URL. ═══ 3.7. Server ═══ Base URL Set of URLs (this item and all items hierarchically below) for which these settings apply. This usually specifies a directory on a server. Example: If you enter "http://some.server/basedir", these settings apply to "http://some.server/basedir/page1.html", but not to "http://some.server/otherdir/b.html". User name User name or user ID used for basic authorization. Password Password used for basic authorization. ═══ 4. Help menu ═══ General help Provides general help Product information Displays name, version number, copyright information etc. ═══ 5. About ═══ This page intentionally left blank.